Allow database dumps to be run against the read-only replica #2144

jtgeibel · 2020-01-20T05:58:29Z

Edit: The description has been updated, with the original message included below.

This PR updates the database dump script so that it can be run against the read-only replica database.

I've tested locally, but we can manually schedule a database dump against the read-only replica and then update the existing daily scheduled task.

Original PR Description

In production, the scheduler was configured to run:

./target/release/enqueue-job dump_db $READ_ONLY_REPLICA

instead of:

./target/release/enqueue-job dump_db $READ_ONLY_REPLICA_URL

Because of the typo in the variable name, the dump_db job has been
running against the primary database. This patch modifies the fallback
to ensure the primary database is only used if explicitly provided.

In production, the scheduler was configured to run: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA ``` instead of: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA_URL ``` Because of the typo in the variable name, the dump_db job has been running against the primary database. This patch modifies the fallback to ensure the primary database is only used if explicitly provided.

rust-highfive · 2020-01-20T05:58:40Z

r? @carols10cents

(rust_highfive has picked a reviewer for you, use r? to override)

carols10cents · 2020-01-20T14:36:42Z

So it turns out it's actually good that the scheduler was configured incorrectly; the dump_db task can't currently run on read-only databases because it creates views :( (see crates-io-incident-response for more info)

sgrif · 2020-01-20T14:37:12Z

We will need to rethink how this script works, it cannot currently be run against the replica (and changing the scheduler config to hit it caused the job to begin erroring). The issues we need to address are:

CREATE VIEW cannot be run in a read only transaction
SERIALIZABLE transactions are not permitted on a hot standby

This addresses the following error: ``` sql_error_code = 0A000 ERROR: cannot use serializable mode in a hot standby sql_error_code = 0A000 HINT: You can use REPEATABLE READ instead. sql_error_code = 0A000 STATEMENT: BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE; ``` Because our application does not rely on any other `SERIALIZABLE` transactions, it should be fine to relax the isoluation level for database dumps to `REPEATABLE READ`. `DEFERRABLE` is dropped because it is only meaningful when combined with `SERIALIZABLE`.

jtgeibel · 2020-01-21T02:56:28Z

Sorry again for the confusion on this stealth configuration change. I should have announced it in the appropriate channel.

I've pushed 2 additional commits to avoid creating temporary views and to switch to the "REPEATABLE READ" isolation level.

I've tested locally (against a normal database), but we can manually schedule a database dump against the read-only replica and then update the existing daily task if that is successful.

sgrif · 2020-01-21T02:58:35Z

Sounds good to me

…

On Mon, Jan 20, 2020, 7:56 PM Justin Geibel ***@***.***> wrote: Sorry again for the confusion on this stealth configuration change. I should have announced it in the appropriate channel. I've pushed 2 additional commits to avoid creating temporary views and to switch to the "REPEATABLE READ" isolation level. I've tested locally (against a normal database), but we can manually schedule a database dump against the read-only replica and then update the existing daily task if that is successful. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2144?email_source=notifications&email_token=AALVMK3SX6HWKNTDAQ527YTQ6ZP5ZA5CNFSM4KI6WKF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJOKIZY#issuecomment-576496743>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALVMK34Z5OV5UK2P2QO3E3Q6ZP5ZANCNFSM4KI6WKFQ> .

smarnach · 2020-01-21T10:21:08Z

src/tasks/dump_db/dump-export.sql.hbs

@@ -1,19 +1,7 @@
-BEGIN;
+BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY;


This provides somewhat weaker consistency guarantees for the snapshot used for the database dump. The Postgres documentation explicitly recommends using SERIALIZABLE READ ONLY DEFERRABLE for backups.

We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica. Neither does it seem unreasonable to do so, even with the weaker consistency guarantees.

One more question – are we sure REPEATABLE READ will work on the hot spare? It isn't obvious why SERIALIZABLE READ ONLY would not work, so we should test this on stage.

are we sure REPEATABLE READ will work on the hot spare?

I definitely expect this to work. We can test by deploying the code, manually scheduling an export, verifying that everything works, and then finally updating the daily task.

It isn't obvious why SERIALIZABLE READ ONLY would not work

This definitely was a surprise to me as well. But I hope my other response helps explain why it isn't possible for the read-replica to be directly involved in cycle tracking on the leader. I've seen several suggestions that using DEFERABLE from a replica could be supported in future releases by writing some additional data in the WAL so that the follower can find a valid snapshot to start from. It appears no such enhancement has been implemented yet.

we should test this on stage.

We can't actually test this on staging, because Heroku only supports read-only replicas on paid database plans. But we shouldn't have any issues merging and deploying this change as we can roll it out via a configuration change to the scheduler.

We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica.

I'm not entirely sure about that. I don't think any of the temporary response time spikes can be traced to the the export specifically, but I've definitely seen slight performance improvements and much less variation in response times after moving just a few expensive endpoints (#2073) to the read-only replica. In total, that PR seems to have moved only around 600 queries per hour off of the primary database, but the improvements seem impressive. It also appears to have eliminated the occasional spikes in download response times. I've collected some rough numbers and plan to follow up on that PR with more details, but I would definitely prefer to run the export from there as well.

I'm perfectly fine with moving to the read replica, and with the additional explanations you gave I'm actually actively in favour of doing so.

smarnach · 2020-01-21T10:35:39Z

src/bin/enqueue-job.rs

@@ -13,7 +13,7 @@ fn main() -> Result<(), Error> {
    match &*job {
        "update_downloads" => Ok(tasks::update_downloads().enqueue(&conn)?),
        "dump_db" => {
-            let database_url = args.next().unwrap_or_else(|| env("DATABASE_URL"));
+            let database_url = args.next().unwrap_or_else(|| env("READ_ONLY_REPLICA_URL"));


I assumed we are passing in the replica URL on the command line in production.

Could we add something like this to .env.sample, to make local testing easier?

export READ_ONLY_REPLICA_URL=$DATABASE_URL

(I'm not sure variable expansion works in that file.)

I'm not sure variable expansion works in that file.

I think that does work, but I don't recall for certain. In this case, I think I would prefer to remove the fallback entirely and add an error message suggesting to pass $DATABASE_URL via the shell.

Sounds good.

sgrif · 2020-01-21T11:31:23Z

The error message when you try to use serializable explicitly says to use repeatable read

…

On Tue, Jan 21, 2020, 3:45 AM Sven Marnach ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/tasks/dump_db/dump-export.sql.hbs <#2144 (comment)>: > @@ -1,19 +1,7 @@ -BEGIN; +BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY; One more question – are we sure REPEATABLE READ will work on the hot spare? It isn't obvious why SERIALIZABLE READ ONLY would not work, so we should test this on stage. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2144?email_source=notifications&email_token=AALVMKZQQFY7VJM7NVDD3GLQ63G4ZA5CNFSM4KI6WKF2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCSOD2VQ#discussion_r368928632>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALVMKZWQLZZTABSGLTOTTLQ63G4ZANCNFSM4KI6WKFQ> .

jtgeibel · 2020-01-22T03:58:50Z

Long response incoming because this is subtle, I've done some recent reading, and I think it's worth documenting my current mental model.

Transaction Isolation

There are 3 isolation levels available in PostgreSQL:

READ COMMITTED - The default isolation level, and as far as I can tell only the db_dump logic uses a more strict level. The transaction has a "live" view of the database, and later queries can see the results of concurrent (committed) transactions that were not available to earlier queries (as it wasn't committed at that time).
REPEATABLE READ - The transaction takes a snapshot of the database upon first access, and concurrent writes by other transactions will not be seen by this transaction. If a transaction does some write operation, then a serialization error may be raised if our change conflicts with some other transaction (sort of like a git merge conflict). The application is expected to repeat the transaction, upon which it will start from a newer snapshot and hopefully does not conflict with newer concurrent writes (kind of like rebasing a PR).
SERIALIZABLE - This is the strictest level and ensures that (relative to other SERIALIZABLE transactions) it will be possible to write down some (i.e. at least one) sequence of transactions that explain the final state of the database. Transactions may still be run in parallel, but it appears as if the transactions had been done in series. Applications and their queries do not need to care about concurrency at all, but this comes at a performance cost and transactions must still be retried if rejected due to serialization errors.

Let's look at the final level a bit more closely.

`SERIALIZABLE`

The key example I've seen goes roughly as follows. You have two transactions running in parallel, with a cyclic data dependency such as a = b + 1 and b = a + 1. Then, if we start with let (a, b) = (0, 0); there are 3 possible outcomes for non-SERIALIZABLE transactions:

If the first transaction runs first, and then the second, we end in the (1, 2) state. ((0, 0) -> (1, 0) -> (1, 2))
If the transactions run sequentially in the opposite order, we end in the (2, 1) state. ((0, 0) -> (0, 1) -> (2, 1))
If the transactions overlap concurrently, it is possible they both start with (0, 0) as the snapshot state and then each write a 1 into the LHS of the assignment. The final state is (1, 1) and there is no way to explain this as if the transactions had occurred in series. In a sense, the database abstraction is leaky and the effects of concurrency become externally observable.

The SERIALIZABLE isolation level is all about eliminating the last bullet point. The database tracks things at runtime and will break data cycles by aborting one of the transactions so it can be retried (with a newer snapshot). The database tries to minimize this runtime cost, but some workloads may trigger pathological behavior (ex: false positives where a transaction is aborted when it could have been committed).

REPEATABLE READ is a pretty strong isolation level, but the lower level concurrency leaks into the application logic and it may be necessary to add explicit locking to queries. SERIALIZABLE does not expose the application to the results of concurrency, but it may not be as efficient as explicit locks.

In 2004 it was discovered that things can be even more subtle. It's possible to describe most concurrency edge-cases with 2 transactions, but this new anomaly requires 3 concurrent transactions to describe.

Serializable read-only anomaly

First, lets cover a bit of notation. We assume a single bank customer with two account balances (checking and savings). The database state is two atomic values represented as a tuple (checking, savings): (AtomicUsize, AtomicUsize). The Rust _ notation is used to show that there is no read/write operation to that value in a particular step. For example read (_, 10) means that 10 was read from savings and the value of checking was not read. write (5, _) means write a 5 to checking. The statement write (0, 0) is not valid, because it is not possible to modify multiple atomics simultaneously. Such a write would be split into 2 operations. Of course read (0, 0) is 2 sequential operations as well, this notation just reflects that the transaction doesn't care about the exact sequence.

The database starts in state read (0, 0), and there are 3 concurrent transactions:

The withdraw transaction wants to subtract $10 from checking. If the total account balance checking + savings becomes negative, then an overdraft charge of $1 is also withdrawn.
The deposit transaction wants to add $100 to savings.
The check_balance read-only transaction which reads both values.

The withdraw and deposit transactions write to separate values, and do not form a write-cycle as demonstrated previously. On their own, it appears the two transactions need no synchronization. The only data dependency is between a write to savings by deposit and a read from savings by withdraw. If the write occurs first, then no fee is charged and the final state is (-10, 100). If the read from withdraw occurs first then a fee is charged and the final state is (-11, 100).

Surprisingly, introducing the read-only check_balance transaction into the mix can cause a serialization anomaly unless handled with care. The scenario is as follows:

withdraw transaction - read (0, 0) and decide that an overdraft fee must be charged.
deposit transaction - read (_, 0), write (_, 100).
check_balance transaction - read (0, 100) - the customer observes a state where the deposit appears to have occurred first.
withdraw transaction - write (-11, _).

At the end of the month, the bank runs check_balance again and sends the customer a statement showing a balance of (-11, 100). This is consistent with the series of events where withdraw occurred first and was later followed by deposit. Conveniently, this interpretation aligns with the physical sequence of events where the read did occur before the write to savings.

Less conveniently, when the customer checked their balance earlier they saw an intermediate state where the deposit had cleared and no withdrawal had yet occurred. From their perspective, the only possible serial sequence that explained that state at the time was that any future withdraw would see the new balance and would not charge a fee.

To resolve this anomaly, if the first 2 bullet point steps have already occurred then the database has only two options:

Block the creation of the snapshot for check_balance until after withdraw is committed.
Reject either withdraw or check_balance and cause that client to restart from a newer snapshot.

This brings us to DEFERABLE. It only applies if the transaction is already SERIALIZABLE, READ ONLY and indicates to the database that we are willing to block initially, waiting for concurrent writes to finish, before creating a snapshot. This is a great optimization for the database, as it no longer needs to include reads from this transaction in its runtime SERIALIZABLE tracking.

Conclusion

I'll follow up on a few other comments, but overall my assessment is that running the database export with REPEATABLE READ will still provide stronger guarantees than what we currently expose via our API (READ COMMITTED). REPEATABLE READ still provides a consistent snapshot of the database state for the export, it just doesn't participate in any of the more complicated SERIALIZABLE runtime checks.

If we later determine our API queries need stronger guarantees somewhere, we would probably prefer to add explicit locking to those queries. Even if we found a use case where we preferred to use SERIALIZABLE in our application, we could probably just document that the database exports "don't participate in complex concurrency situations" and "may expose internal intermediate states that appear to conflict with the eventual future state of the database".

smarnach · 2020-02-03T10:07:31Z

@jtgeibel Thanks for taking the time for a detailed answer! I agree with your conclusion – given that we don't have any other transactions with increased isolation levels, there currently shouldn't be an observable difference. My main concern when introducing the isolation level was to make sure that at least all foreign key relations still work, so it's possible to re-import the data into the database. With READ COMMITTED, not even that would be guaranteed, but REPEATABLE READ should be fine in that regard.

jtgeibel · 2020-02-05T23:44:43Z

@bors r=smarnach

bors · 2020-02-05T23:44:44Z

📌 Commit 4ae90e6 has been approved by smarnach

bors · 2020-02-05T23:44:52Z

⌛ Testing commit 4ae90e6 with merge 7072305...

Allow database dumps to be run against the read-only replica Edit: The description has been updated, with the original message included below. This PR updates the database dump script so that it can be run against the read-only replica database. I've tested locally, but we can manually schedule a database dump against the read-only replica and then update the existing daily scheduled task. ## Original PR Description In production, the scheduler was configured to run: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA ``` instead of: ``` ./target/release/enqueue-job dump_db $READ_ONLY_REPLICA_URL ``` Because of the typo in the variable name, the dump_db job has been running against the primary database. This patch modifies the fallback to ensure the primary database is only used if explicitly provided.

bors · 2020-02-05T23:51:07Z

☀️ Test successful - checks-travis
Approved by: smarnach
Pushing 7072305 to master...

rust-highfive assigned carols10cents Jan 20, 2020

rust-highfive added the S-waiting-on-review label Jan 20, 2020

jtgeibel added 2 commits January 20, 2020 19:31

Move db_dump filtering logic into main read-only transaction

d8b8235

jtgeibel changed the title ~~Fallback to READ_ONLY_REPLICA_URL for db_dump~~ Allow database dumps to be run against the read-only replica Jan 21, 2020

Ensure \copy commands in export.sql are single line

4ae90e6

smarnach reviewed Jan 21, 2020

View reviewed changes

bors merged commit 4ae90e6 into rust-lang:master Feb 5, 2020

jtgeibel deleted the prod/fix-db-dump-fallback branch March 8, 2020 05:18

		@@ -1,19 +1,7 @@
		BEGIN;
		BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY;

Allow database dumps to be run against the read-only replica #2144

Allow database dumps to be run against the read-only replica #2144

Uh oh!

Conversation

jtgeibel commented Jan 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Original PR Description

Uh oh!

rust-highfive commented Jan 20, 2020

Uh oh!

carols10cents commented Jan 20, 2020

Uh oh!

sgrif commented Jan 20, 2020

Uh oh!

jtgeibel commented Jan 21, 2020

Uh oh!

sgrif commented Jan 21, 2020 via email

Uh oh!

smarnach Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

smarnach Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

jtgeibel Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

smarnach Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

smarnach Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

jtgeibel Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

smarnach Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

sgrif commented Jan 21, 2020 via email

Uh oh!

jtgeibel commented Jan 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Transaction Isolation

SERIALIZABLE

Serializable read-only anomaly

Conclusion

Uh oh!

smarnach commented Feb 3, 2020

Uh oh!

jtgeibel commented Feb 5, 2020

Uh oh!

bors commented Feb 5, 2020

Uh oh!

bors commented Feb 5, 2020

Uh oh!

bors commented Feb 5, 2020

Uh oh!

Uh oh!

jtgeibel commented Jan 20, 2020 •

edited

Loading

jtgeibel commented Jan 22, 2020 •

edited

Loading

`SERIALIZABLE`